Speculations: Providing Fault-tolerance and Recoverability in Distributed Environments
نویسندگان
چکیده
Building safe and reliable programs is an important but difficult endeavor. The challenge is even greater in the context of distributed environments, which may involve complex synchronization operations in the presence of process and network failures. Transactions are one of the earliest and simplest abstractions for reliable concurrent programming [2]. They provide fault-isolation by guaranteeing the atomicity, the consistency and the durability of the actions performed as part of the transaction. Traditional transactions also provide isolation, which prevents the independent actions inside of a transaction from being visible to the outside world until the transaction either aborts or commits. In this paper we consider the case where multiple processes may cooperate in a transaction, using message passing for communication. We relax the transactional isolation property to permit inter-process communication while executing inside the transaction. This model can improve performance and provide fault-tolerance for distributed applications. We call these transactions with relaxed isolation speculations, and we introduce them as programming language primitives. Traditional checkpointing and rollback mechanisms used to provide recoverability are also similar to our approach. However, there are a few differences, as follows. Speculations can provide programs with alternate execution paths upon rollback. Speculations are lightweight checkpoints that are stored in memory and can be coupled with real checkpointing mechanism for increased reliability. Speculations are exposed as programming language primitives that have a semantics closer to that of transactions than that of checkpoints. Our system adapts mechanisms designed for checkpointing/rollback systems [1] to ensure safe recovery lines in case of distributed speculation rollback. While speculations are similar to the concept of lookahead-rollback introduced by the TimeWarp [3] mechanism, we extend the concept by allowing both explicit and implicit speculations through programming language extensions. The main contributions of this paper include: (1) the introduction of a new programming model based on speculations, (2) the definition of new speculative programming language constructs for distributed applications, (3) the description of a prototype implementation of speculations in the Linux kernel where speculative operations, including distributed commit and rollback, are transparent.
منابع مشابه
HotDep ’06: Second Workshop on Hot Topics in System Dependability
S) Summarized by Geoffrey Lefebvre Making Exception Handling Work Bruno Cabral and Paulo Marques, University of Coimbra, Portugal Presented by Bruno Cabrel Exceptions are the standard mechanism for error handling in modern programming languages. Unfortunately, dealing with exceptions is a tedious process. Programmers often avoid the issue by writing empty handlers to save time. Programmers who ...
متن کاملDistributed Speculations: Providing Fault-tolerance and Improving Performance
This thesis introduces a new programming model based on speculative execution and it examines the use of speculations, a form of distributed transactions, for improving the performance, reliability and fault tolerance of distributed systems. A speculation is defined as a computation that is based on an assumption that is not validated before the computation is started. If the assumption is late...
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کاملRecovery with limited replay: fault-tolerant processes in Linda
Research in the area of fault-tolerant distributed systems has focused to a large extent on data surviving various forms of failure. The replica control algorithms for maintaining mutually consistent replicas abound in number. However, comparatively little work has been devoted to making processes recoverable. In domains other than databases and transaction processing, faulttolerance generally ...
متن کاملA Theory of Nested Speculative Execution
Implementing distributed applications is a challenging task. Developers of such systems are confronted with issues like fault-tolerance, efficient synchronization mechanisms, and the correctness of the distributed code. This paper introduces a new programming model based on speculative execution that addresses these issues. Speculations provide distributed atomic rollback and enable optimistic ...
متن کامل